我们提出了一个解决方案,以检测视频中的异常事件,而无需离线训练模型。具体而言,我们的解决方案基于一个随机定量的多层感知器,该多层概念可以在线优化,以从其频率信息中重建视频帧,像素像素。基于相邻帧之间的信息变化,在观察每个帧之后,使用增量学习者来更新多层感知器的参数,从而允许沿视频流检测异常事件。不需要离线培训的传统解决方案仅限于只有几个异常帧的视频操作。我们的解决方案打破了这一限制,并在基准数据集上实现了强劲的性能。
translated by 谷歌翻译
听觉事件的感知固有地依赖于音频和视觉提示。许多现有的多模式方法使用模式特异性模型处理每种模式,然后融合嵌入以编码关节信息。相反,我们采用异质图来明确捕获模态之间的空间和时间关系,并表示有关基础信号的详细信息。使用异质图方法来解决视觉感知的声学事件分类的任务,该任务是一种紧凑,有效且可扩展的方式,以图形形式表示数据。通过异质图,我们显示了在空间和时间尺度上有效地建模模式和模式间关系。我们的模型可以通过相关的超参数轻松适应不同的事件规模。在Audioset上进行的实验(一个大型基准)表明,我们的模型实现了最先进的性能。
translated by 谷歌翻译
视频异常检测是一项具有挑战性的任务,因为大多数异常都是稀缺和非确定性的。许多方法研究了正常模式和异常模式之间的重建差异,但是忽略了异常不一定与大重建误差相对应。为了解决这个问题,我们设计了使用双向方向和高阶机制的增强时空存储器交换的卷积LSTM自动编码器预测框架。双向结构通过前进和向后的预测促进了学习时间的规律性。独特的高阶机制进一步加强了编码器和解码器之间的空间信息相互作用。考虑到卷积LSTMS中有限的接收场,我们还引入了一个注意模块,以突出预测的信息特征。最终通过将框架与它们的相应预测进行比较来确定异常。对三个流行基准的评估表明,我们的框架的表现优于大多数基于预测的异常检测方法。
translated by 谷歌翻译
随着卷积神经网络(CNNS)的普及日益普及,最近的面部年龄估计的作品雇用这些网络作为骨干。然而,最先进的基于CNN的方法同样地治疗每个面部区域,从而完全忽略了一些可能包含富年龄信息的面部斑块的重要性。在本文中,我们提出了一种基于面部的年龄估计框架,称为关注的动态补丁融合(ADPF)。在ADPF中,实现了两个单独的CNN,即IpperenceNet和FusionNet。 EpperenceNet通过采用新的排名引导的多头混合注意力(RMHHA)机制来动态定位并排名特定年龄的补丁。 FusionNet使用发现的补丁以及面部图像来预测主题的年龄。由于提出的RMHA机制根据其重要性排名发现的补丁,因此FusionNet中的每个补丁的学习路径的长度与其携带的信息量成比例(较长,更重要的)。 ADPF还介绍了一种新颖的多样性损失,以指导IppectionNet的培训,并减少补丁中的重叠,以便发现多样化和重要的补丁。通过广泛的实验,我们表明我们所提出的框架优于几个年龄估计基准数据集的最先进的方法。
translated by 谷歌翻译
深入学习被认为是可逆隐写术的有希望的解决方案。最近的最终学习的发展使得可以通过一对编码器和解码器神经网络绕过隐写操作的多个中间阶段。然而,这一框架是无法保证完美的可逆性,因为这种单片机械难以以黑匣子的形式来学习可逆计算的复杂逻辑。开发基于学习的可逆书签方案的更可靠的方法是通过分裂和征服范例。预测误差调制是一种建立的模块化框架,包括分析模块和编码模块。前者服务于分析像素相关性并预测像素强度,而后者专注于可逆编码机制。鉴于可逆性由编码模块独立管理,我们将专注于将神经网络纳入分析模块。本研究的目的是评估不同培训配置对预测神经网络的影响,并提供实用的见解。背景感知像素强度预测在可逆的隐写术中具有核心作用,并且可以被认为是低级计算机视觉任务。因此,我们可以采用最初为这种计算机视觉任务设计的神经网络模型来执行强度预测。此外,我们严格研究强度初始化对预测性能的影响以及双层预测的分布变换的影响。实验结果表明,通过先进的神经网络模型可以实现最先进的书签性能。
translated by 谷歌翻译
In this work, we address the problem of unsupervised moving object segmentation (MOS) in 4D LiDAR data recorded from a stationary sensor, where no ground truth annotations are involved. Deep learning-based state-of-the-art methods for LiDAR MOS strongly depend on annotated ground truth data, which is expensive to obtain and scarce in existence. To close this gap in the stationary setting, we propose a novel 4D LiDAR representation based on multivariate time series that relaxes the problem of unsupervised MOS to a time series clustering problem. More specifically, we propose modeling the change in occupancy of a voxel by a multivariate occupancy time series (MOTS), which captures spatio-temporal occupancy changes on the voxel level and its surrounding neighborhood. To perform unsupervised MOS, we train a neural network in a self-supervised manner to encode MOTS into voxel-level feature representations, which can be partitioned by a clustering algorithm into moving or stationary. Experiments on stationary scenes from the Raw KITTI dataset show that our fully unsupervised approach achieves performance that is comparable to that of supervised state-of-the-art approaches.
translated by 谷歌翻译
Selecting the number of topics in LDA models is considered to be a difficult task, for which alternative approaches have been proposed. The performance of the recently developed singular Bayesian information criterion (sBIC) is evaluated and compared to the performance of alternative model selection criteria. The sBIC is a generalization of the standard BIC that can be implemented to singular statistical models. The comparison is based on Monte Carlo simulations and carried out for several alternative settings, varying with respect to the number of topics, the number of documents and the size of documents in the corpora. Performance is measured using different criteria which take into account the correct number of topics, but also whether the relevant topics from the DGPs are identified. Practical recommendations for LDA model selection in applications are derived.
translated by 谷歌翻译
Applying deep learning concepts from image detection and graph theory has greatly advanced protein-ligand binding affinity prediction, a challenge with enormous ramifications for both drug discovery and protein engineering. We build upon these advances by designing a novel deep learning architecture consisting of a 3-dimensional convolutional neural network utilizing channel-wise attention and two graph convolutional networks utilizing attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based Convolutional Neural Network) obtains state-of-the-art results on the PDBbind v.2016 core set, the most widely recognized benchmark in the field. We extensively assess the generalizability of our model using multiple train-test splits, each of which maximizes differences between either protein structures, protein sequences, or ligand extended-connectivity fingerprints. Furthermore, we perform 10-fold cross-validation with a similarity cutoff between SMILES strings of ligands in the training and test sets, and also evaluate the performance of HAC-Net on lower-quality data. We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction. All of our software is available as open source at https://github.com/gregory-kyro/HAC-Net/.
translated by 谷歌翻译
Counterfactual explanation is a common class of methods to make local explanations of machine learning decisions. For a given instance, these methods aim to find the smallest modification of feature values that changes the predicted decision made by a machine learning model. One of the challenges of counterfactual explanation is the efficient generation of realistic counterfactuals. To address this challenge, we propose VCNet-Variational Counter Net-a model architecture that combines a predictor and a counterfactual generator that are jointly trained, for regression or classification tasks. VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem. Our contribution is the generation of counterfactuals that are close to the distribution of the predicted class. This is done by learning a variational autoencoder conditionally to the output of the predictor in a join-training fashion. We present an empirical evaluation on tabular datasets and across several interpretability metrics. The results are competitive with the state-of-the-art method.
translated by 谷歌翻译
Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.
translated by 谷歌翻译